Intro
Los datos utilizados presentan 615 valores correspondientes a
pacientes entre 19 y 77 años de edad. Se presentan datos de 377 varones
y 238 mujeres, separados en 5 categorías: las marcadas
con 1 pertenecen a pacientes con el nivel mas bajo de
hepatitis, 2 es para estados mas avanzados que incluyen
fibrosis, 3 para cirrosis y 0
control.
|
Category
|
Age
|
Sex
|
ALB
|
ALP
|
ALT
|
AST
|
BIL
|
CHE
|
CHOL
|
CREA
|
GGT
|
PROT
|
severity
|
|
0=Blood Donor
|
49
|
m
|
39.1
|
62.1
|
23.8
|
19.6
|
3.5
|
9.19
|
4.82
|
85.0
|
19.4
|
69.8
|
0
|
|
2=Fibrosis
|
29
|
m
|
41.0
|
43.1
|
2.4
|
83.5
|
6.0
|
11.49
|
5.42
|
55.2
|
130.0
|
66.5
|
2
|
|
0=Blood Donor
|
57
|
m
|
43.3
|
86.8
|
21.2
|
22.2
|
6.8
|
7.87
|
4.91
|
65.0
|
19.2
|
71.3
|
0
|
|
0=Blood Donor
|
46
|
f
|
36.7
|
62.3
|
10.8
|
17.4
|
3.7
|
6.17
|
4.07
|
67.0
|
15.1
|
69.0
|
0
|
|
3=Cirrhosis
|
51
|
m
|
33.0
|
29.6
|
4.5
|
66.6
|
91.0
|
4.02
|
4.08
|
75.9
|
28.5
|
62.3
|
3
|
|
0=Blood Donor
|
32
|
m
|
46.9
|
74.7
|
36.2
|
52.6
|
6.1
|
8.84
|
5.20
|
86.0
|
33.2
|
79.3
|
0
|
|
0=Blood Donor
|
57
|
f
|
44.7
|
60.6
|
16.5
|
24.3
|
4.2
|
10.47
|
4.90
|
68.0
|
15.9
|
68.5
|
0
|
|
1=Hepatitis
|
19
|
m
|
41.0
|
NA
|
87.0
|
67.0
|
12.0
|
7.55
|
3.90
|
62.0
|
65.0
|
75.0
|
1
|
|
1=Hepatitis
|
46
|
m
|
48.0
|
59.5
|
11.6
|
39.0
|
7.0
|
16.41
|
4.65
|
66.4
|
158.2
|
72.7
|
1
|
|
1=Hepatitis
|
32
|
m
|
41.0
|
34.4
|
12.1
|
60.9
|
6.0
|
13.80
|
5.48
|
45.4
|
33.1
|
71.1
|
1
|
Boxplots
Column
Por variable
Boxplot de las 10 variables clínicas numéricas. Se
observa una gran cantidad de valores extremos.
Por severidad
Boxplot de las 10 variables clínicas numéricas, agrupadas por
severidad. A pesar de la agrupación, se siguen observando gran
cantidad de valores extremos.
Boxplots por var
Las variables ALB, AST, BIL, CHOL y
GGT muestran tendencias con respecto a la
severidad.
Correlaciones
De manera general, existe una pobre correlación entre las
variables.
Column
Heatmap

ACP
El ACP no muestra agrupaciones claras de las variables con respecto a
los primeros 5 componentes, que explican el 70% de la varianza.
Column
Eigenvalores
|
|
eigenvalue
|
Varianza(%)
|
Varianza_acumulada(%)
|
|
Dim.1
|
2.4617204
|
22.379276
|
22.37928
|
|
Dim.2
|
1.8176773
|
16.524339
|
38.90362
|
|
Dim.3
|
1.3660444
|
12.418586
|
51.32220
|
|
Dim.4
|
1.0954300
|
9.958455
|
61.28066
|
|
Dim.5
|
0.9630787
|
8.755261
|
70.03592
|
|
Dim.6
|
0.7610772
|
6.918883
|
76.95480
|
|
Dim.7
|
0.7009821
|
6.372564
|
83.32737
|
|
Dim.8
|
0.5873901
|
5.339910
|
88.66727
|
|
Dim.9
|
0.4932310
|
4.483918
|
93.15119
|
|
Dim.10
|
0.4192509
|
3.811372
|
96.96256
|
|
Dim.11
|
0.3341179
|
3.037435
|
100.00000
|
Contribuciones
|
|
Dim.1
|
Dim.2
|
Dim.3
|
Dim.4
|
Dim.5
|
|
Age
|
3.1218612
|
1.3524742
|
20.2942490
|
8.2501905
|
20.2651140
|
|
ALB
|
19.7965348
|
3.1421703
|
8.3257415
|
2.2799606
|
3.1052366
|
|
ALP
|
3.4172926
|
16.7664193
|
13.1876036
|
6.1106804
|
0.1512957
|
|
ALT
|
0.3765221
|
18.8413903
|
1.5758660
|
7.8010735
|
30.0276532
|
|
AST
|
13.6010029
|
8.0193009
|
15.0365244
|
2.9821453
|
0.0130968
|
|
BIL
|
11.6874143
|
0.0018451
|
8.4922394
|
0.6616438
|
24.4104510
|
|
CHE
|
17.3078777
|
9.2711447
|
1.9590093
|
0.1981358
|
1.4800628
|
|
CHOL
|
8.1937629
|
9.9384921
|
11.7619771
|
2.0168415
|
9.4439781
|
|
CREA
|
0.4213311
|
0.5841866
|
1.2878061
|
68.1197549
|
0.6664951
|
|
GGT
|
12.0616201
|
22.2946873
|
0.0827347
|
1.0487663
|
0.0994636
|
|
PROT
|
10.0147803
|
9.7878893
|
17.9962488
|
0.5308073
|
10.3371533
|
Modelos predictivos
Se entrenaron 4 modelos diferentes para predecir presencia o ausencia
de infección, y para predecir la severidad de infecciones. Debido a la
excesiva presencia de valores extremos y la carencia de distribución
normal, se utilizaron modelos robustos que no son afectados por la
presencia de valores extremos. Los modelos utilizados son:
- Random forest (RF) utilizando la
librería
randomForest.
- Multinomial log-linear en base a redes neuronales
(LOG) con la librería
nnet.
- Gaussian naive bayes (GNB)
utilizando la librería
e1071.
- K nearest neighbors (KNN) a través
de la librería
caret.
Predic. Infección
Column
Random Forest (RF)
|
|
Reference
|
|
|
0
|
1
|
|
Prediction
|
|
0
|
162
|
0
|
|
1
|
0
|
23
|
Column
Multinomial log-linear (LOG)
|
|
Reference
|
|
|
0
|
1
|
|
Prediction
|
|
0
|
162
|
0
|
|
1
|
0
|
23
|
Column
Gaussian naive bayes (GNB)
|
|
Reference
|
|
|
0
|
1
|
|
Prediction
|
|
0
|
161
|
7
|
|
1
|
1
|
16
|
Column
K nearest neighbors (KNN)
|
|
Reference
|
|
|
0
|
1
|
|
Prediction
|
|
0
|
162
|
1
|
|
1
|
0
|
22
|
Predic. Severidad
Column
Random Forest (RF)
|
|
Reference
|
|
|
1
|
2
|
3
|
|
Prediction
|
|
1
|
4
|
2
|
0
|
|
2
|
4
|
4
|
1
|
|
3
|
0
|
1
|
8
|
Column
Multinomial log-linear (LOG)
|
|
Reference
|
|
|
1
|
2
|
3
|
|
Prediction
|
|
1
|
4
|
2
|
0
|
|
2
|
1
|
3
|
2
|
|
3
|
3
|
2
|
7
|
Column
Gaussian naive bayes (GNB)
|
|
Reference
|
|
|
1
|
2
|
3
|
|
Prediction
|
|
1
|
2
|
0
|
0
|
|
2
|
6
|
7
|
1
|
|
3
|
0
|
0
|
8
|
Column
K nearest neighbors (KNN)
|
|
Reference
|
|
|
1
|
2
|
3
|
|
Prediction
|
|
1
|
5
|
3
|
0
|
|
2
|
3
|
4
|
2
|
|
3
|
0
|
0
|
7
|
Resultados Modelos
LOG es certero para presencia/ausencia de infección
pero el menos preciso para severidad. RF también es
preciso para presencia/ausencia de infección; GNB, RF y
KNN presentan similares valores de exactitud y kappa
para severidad.
Column
Infección
|
Accuracy
|
Kappa
|
AccuracyLower
|
AccuracyUpper
|
AccuracyNull
|
AccuracyPValue
|
McnemarPValue
|
Method
|
|
1.0000000
|
1.0000000
|
0.9802576
|
1.0000000
|
0.8756757
|
0.0000000
|
NaN
|
RF
|
|
1.0000000
|
1.0000000
|
0.9802576
|
1.0000000
|
0.8756757
|
0.0000000
|
NaN
|
LOG
|
|
0.9567568
|
0.7763675
|
0.9165742
|
0.9811485
|
0.8756757
|
0.0001484
|
0.0770999
|
GNB
|
|
0.9945946
|
0.9747026
|
0.9702525
|
0.9998632
|
0.8756757
|
0.0000000
|
1.0000000
|
KNN
|
Severidad
|
Accuracy
|
Kappa
|
AccuracyLower
|
AccuracyUpper
|
AccuracyNull
|
AccuracyPValue
|
McnemarPValue
|
Method
|
|
0.6666667
|
0.5000000
|
0.4467804
|
0.8436977
|
0.375
|
0.0035690
|
NaN
|
RF
|
|
0.5833333
|
0.3650794
|
0.3664306
|
0.7789031
|
0.375
|
0.0307271
|
0.3430301
|
LOG
|
|
0.7083333
|
0.5692308
|
0.4890522
|
0.8738479
|
0.375
|
0.0009502
|
NaN
|
GNB
|
|
0.6666667
|
0.5025907
|
0.4467804
|
0.8436977
|
0.375
|
0.0035690
|
NaN
|
KNN
|